Windows Azure : Common Storage Tasks - Modeling Data

10/22/2010 6:10:05 PM

For folks used to modeling data in an RDBMS world, not having the same tools (foreign keys, joins) in the Azure table world can be a bit of a culture shock. One area where a lot of people have trouble is basic data modeling.
Note: Steve Marx from the Windows Azure team suggested that some of the information in this section be included in this book, and his code was the source of ideas for some of the sample code shown here.

1. One-to-Many

When you model data, you often have a parent-child relationship, or a one-to-many relationship. A canonical example is a customer-order data model as shown in Figure 1 —a customer “has” many orders, and you often do lookups where you want to get all orders belonging to a single customer.

Let’s turn the diagram shown in Figure 11-3 into a model in Azure tables. The following code shows a simple Customer data model. There’s nothing fancy about it—it just represents some sample properties on the Customer and picks an arbitrary partitioning scheme:

    class Customer:TableServiceEntity
    {
        public Customer(string name, string id, string company,
                        string address):base(company, id)
        {
            this.Name = name;
            this.ID = id;
            this.Company = company;
            this.Address = address;

            this.PartitionKey = this.Company;
            this.RowKey = this.ID;
        }

        public Customer() { }

        public string Name { get; set; }
        public string Company { get; set; }
        public string Address { get; set; }
        public string ID { get; set; }
    }

Figure 1. One-to-many relationship

Similarly, let’s define an Order entity. You would want to look up a given customer’s orders quickly, so a customer ID makes a natural choice of partition key, since you can always specify that in any queries you make.

The following code shows an Order entity that takes the customer ID it “belongs” to, as well as some other properties. For those of you who are used to specifying foreign keys, note how the “foreign key” relationship between customer and order is implicit in the fact that CustomerID is specified in the creation of every order. However, as mentioned previously, there is no referential integrity checking across tables. You could happily delete customer IDs and the table service won’t warn you of dangling, orphaned OrderIDs:

class Order : TableServiceEntity
    {

        public Order(string customerID, string orderID, string orderDetails)
            : base(customerID, orderID)
        {
            this.CustomerID = customerID;
            this.OrderID = orderID;
            this.OrderDetails = orderDetails;
            this.PartitionKey = CustomerID;
            this.RowKey = OrderID;
        }


        public string CustomerID { get; set; }
        public string OrderID { get; set; }
        public string OrderDetails { get; set; }
    }

The final piece of the puzzle is to get all orders pertaining to a given customer. There are a few ways you can do this.

The first is to store all the OrderIDs for a given Customer as a property of the Customer object as a serialized list. This has the advantage of not having to do multiple queries—when you get back the Customer object, you already have the list of orders as well. However, this is suboptimal for huge numbers of orders, because you can store only a limited number of such IDs before you run into the size limits on entities.

A better model is to add a helper method to the Customer entity class to look up all Order entities associated with it. This has the overhead of adding another query, but will scale to any number of orders. The following code shows the modification to the Customer class code. The code assumes a data service context class that has properties corresponding to Customer and Order table name (not shown).

    class Customer:TableServiceEntity
    {
        public Customer(string name, string id, string company,
                        string address):base(company, id)
        {
            this.Name = name;
            this.ID = id;
            this.Company = company;
            this.Address = address;

            this.PartitionKey = this.Company;
            this.RowKey = this.ID;
        }

        public Customer() { }

        public string Name { get; set; }
        public string Company { get; set; }
        public string Address { get; set; }
        public string ID { get; set; }

        public IEnumerable<Order> GetOrders()
        {
            return from o in new CustomerOrderDataServiceContext().OrderTable
                   where o.PartitionKey == this.ID
                   select o;

        }
    }

2. Many-to-Many

Another common scenario in modeling data is a many-to-many relationship. This is best explained with the help of a sample model, such as the one shown in Figure 2 . This model could form the basis of many social networking sites. It shows two entities, Friend and Group, with a many-to-many relationship with each other. There can be many friends in a single group (example groups being “School,” “College,” “Work,” and “Ex-boyfriends”), and a friend can be in many groups (“Work” and “Ex-boyfriends”).

The application may want to traverse this relationship in either direction. Given a friend, it might want to display all groups she belongs to. Similarly, given a group, it might want to list all the friends you have in it.

Let’s start by creating some simple Friend and Group entity classes. They both define some simple properties, and have a simple partitioning scheme. The partitioning scheme isn’t important for the discussion here:

    class Friend : TableServiceEntity
    {

        public string Name{get;set;}
        public string FriendID {get;set;}
        public string Details {get;set;}

        public Friend(string id, string name, string details):base(name, id)
        {
            this.Name = name;
            this.FriendID = id;
            this.Details = details;

            this.PartitionKey = Name;
            this.RowKey = FriendID;
        }

        public Friend(){}

    }

    class Group : TableStorageEntity
    {
        public string Name { get; set; }
        public string GroupID {get;set;}

        public Group(string name, string id)
            : base(id, id)
        {
            this.Name = name;
            this.GroupID = id;
            this.PartitionKey = id;
            this.RowKey = id;
        }

        public Group() { }

    }

Figure 2. Many-to-many relationship

How do you now represent the relationship between Friend and Group? The best way to deal with this is to create a separate “join” table that contains one entity per one friend-group relation. To look up all friends in a group, you just need to query this table with that specific GroupID (and vice versa, for all groups a friend belongs to). Following is the code for this simple table:

class FriendGroupRelationship : TableServiceEntity
    {
        public string FriendID { get; set; }
        public string GroupID { get; set; }

        public FriendGroupRelationship(string friendID, string groupID)
            : base(friendID, groupID)
        {
            this.FriendID = friendID;
            this.GroupID = groupID;
            this.PartitionKey = FriendID;
            this.RowKey = GroupID;
        }

        public FriendGroupRelationship() { }

    }

In this code, you chose to partition based on FriendID. This means that querying all groups to which a friend belongs will be fast, while the reverse won’t be. If your application cares more about displaying all friends in a group quickly, you can pick the reverse partitioning scheme, or create two tables with two different partitioning schemes for the two scenarios.

Note that when creating a new Friend or Group, you must add an entity to the join table, and remove the entity when you delete the friend or the group. Following is a code snippet that shows how that might look:

            var id = new Guid().ToString();
            var friend = new Friend(id,
           "Jean Luc Picard", "Captain, U.S.S. Enterprise");

            // Add Picard to a group
            var friendgrouprelation = new
              FriendGroupRelationship(id, "captains");
            context.AddObject("Friend", friend);
            context.AddObject("FriendGroupRelationship",
               friendgrouprelation);